$\newcommand{\xv}{\mathbf{x}} \newcommand{\Xv}{\mathbf{X}} \newcommand{\yv}{\mathbf{y}} \newcommand{\zv}{\mathbf{z}} \newcommand{\av}{\mathbf{a}} \newcommand{\Wv}{\mathbf{W}} \newcommand{\wv}{\mathbf{w}} \newcommand{\tv}{\mathbf{t}} \newcommand{\Tv}{\mathbf{T}} \newcommand{\muv}{\boldsymbol{\mu}} \newcommand{\sigmav}{\boldsymbol{\sigma}} \newcommand{\phiv}{\boldsymbol{\phi}} \newcommand{\Phiv}{\boldsymbol{\Phi}} \newcommand{\Sigmav}{\boldsymbol{\Sigma}} \newcommand{\Lambdav}{\boldsymbol{\Lambda}} \newcommand{\half}{\frac{1}{2}} \newcommand{\argmax}[1]{\underset{#1}{\operatorname{argmax}}} \newcommand{\argmin}[1]{\underset{#1}{\operatorname{argmin}}}$
The models we have been buildling are linear in the parameters $\wv$ and linear in the attributes (features) of the samples. We can make models that are nonlinear in the attributes by adding nonlinear functions of the original features.
Say we have a single feature for each sample. Our data matrix is $$ \begin{alignat*}{1} X &= \begin{bmatrix} x_0\\ x_1\\ \vdots \\ x_N \end{bmatrix} \end{alignat*} $$ We can add other powers of each $x$ value, say up to the fourth power. $$ \begin{alignat*}{1} X &= \begin{bmatrix} x_0 & x_0^2 & x_0^3 & x_0^4\\ x_1 & x_1^2 & x_1^3 & x_1^4\\ \vdots \\ x_N & x_N^2 & x_N^3 & x_N^4\\ \end{bmatrix} \end{alignat*} $$
This is simple to do in python.
X = np.hstack((X, X**2, X**3, X**4))
Which of these powers of $x$ are useful? Looking at the magnitudes of the weights is helpful, as long as the input features are standardized. We can do more than this, though. If we build multiple models from multiple bootstrap samples of the training data, we can compute confidence intervals of the weight values. If zero is not included in the range of weight values specified by a weight's 90% lower and upper confidencce limit, then we can say that we are 90% certain that the value of this weight is not zero. If the range does include zero, the corresponding feature is probably one that is not useful.
Here is some code that illustrates this whole process, including the use of the penalty on weight magnitudes controlled by $\lambda$.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
import random
In [3]:
def train(X,T,lamb=0):
means = X.mean(0)
stds = X.std(0)
n,d = X.shape
Xs1 = np.insert((X - means)/stds, 0, 1, axis=1)
lambDiag = np.eye(d+1) * lamb
lambDiag[0,0] = 0
w = np.linalg.lstsq( np.dot(Xs1.T,Xs1) + lambDiag, np.dot(Xs1.T,T))[0]
return {'w': w, 'means':means, 'stds':stds, 'lambda': lamb}
def use(model, X):
columnOfOnes = np.ones(( X.shape[0],1 ))
Xs1 = np.insert((X-model['means'])/model['stds'], 0, 1, axis=1)
return Xs1 @ model['w']
def rmse(A,B):
return np.sqrt(np.mean( (A-B)**2 ))
Now, make a simple function of $x$. How about $f(x) = -1 + 0.1 x^2 - 0.02 x^3 + 0.5 n$, where $n$ is from a standard Normal distribution.
In [4]:
nSamples = 40
trainingFraction = 0.5
nModels = 1000
confidence = 90 # percent
lambdaw = 0.0 # penalty on weight magnitudes
X = np.hstack((np.linspace(0, 3, num=nSamples),
np.linspace(6, 10, num=nSamples))).reshape((2*nSamples,1))
# T = -1 + 1 * X + 2 * np.sin(X*2) + 0.55*np.random.normal(size=(2*nSamples,1))
T = -1 + 0 * X + 0.1 * X**2 - 0.02 * X**3 + 0.5*np.random.normal(size=(2*nSamples,1))
X.shape, T.shape
Out[4]:
In [5]:
plt.plot(X,T,'.-');
Let's add squared and cubed values of each feature.
In [6]:
#Xf = np.hstack((X, X**2, X**3, X**4, X**5))
Xf = np.hstack((X, np.sin(X)))
Divide data into training and testing sets, randomly.
In [7]:
random.sample([1,2,3,4,5],5)
Out[7]:
In [17]:
nRows = Xf.shape[0]
nTrain = int(round(nRows*trainingFraction))
nTest = nRows - nTrain
allI = range(nRows)
trainI = random.sample(allI,nTrain)
testI = list(set(allI).difference(set(trainI)))
Xtrain = Xf[trainI,:]
Ttrain = T[trainI,:]
Xtest = Xf[testI,:]
Ttest = T[testI,:]
Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
Out[17]:
In [18]:
plt.plot(Xtrain[:,0],Ttrain,'o',label='Train')
plt.plot(Xtest[:,0],Ttest,'ro',label='Test')
plt.xlim(-2,12)
plt.legend(loc='best');
Make models based on bootstrap samples of training data. models will be list of models, one for each bootstrap sample.
In [21]:
nModels = 100
models = []
for modeli in range(nModels):
# Draw random sample (row) numbers, with repetition
trainI = [random.randint(0,nTrain-1) for i in range(nTrain)]
XtrainBoot = Xtrain[trainI,:]
TtrainBoot = Ttrain[trainI,:]
model = train(XtrainBoot,TtrainBoot,0)
models.append(model)
In [22]:
len(models)
Out[22]:
In [23]:
models[0]
Out[23]:
Now we will apply all of the models to the test data.
In [29]:
YAll = []
for model in models:
YAll.append( use(model, Xtest) )
YAll = np.array(YAll).squeeze().T
Ytest = np.mean(YAll,axis=1)
Ttest = Ttest.flatten() ## WATCH OUT!! Next line produces wrong scalar without this line!
RMSEtest = np.sqrt(np.mean((Ytest - Ttest)**2))
print('Test RMSE is {:.4f}'.format(RMSEtest))
In [30]:
nPlot = 100
Xplot = np.linspace(0,13.5,nPlot).reshape((nPlot,1))
Xplotf = np.hstack((Xplot,np.sin(Xplot))) #Xplot**2,Xplot**3,Xplot**4,Xplot**5))
Ys = []
for model in models:
Yplot = use(model, Xplotf)
Ys.append( Yplot.flatten())
Ys = np.array(Ys).T
plt.figure(figsize=(10,10))
plt.plot(Xtrain[:,0],Ttrain,'o');
plt.plot(Xtest[:,0],Ttest,'ro');
plt.plot(Xplotf[:,0],Ys,alpha=0.2);
Now let's try some other data. Here is some data recorded from EMG sensors and one column of resulting joint forces. This data is from this UCI repository. There you will see that the data has 4 channels of EMG and one of force, measured at the knee of 22 subjects.
In [31]:
!wget http://www.cs.colostate.edu/~anderson/cs480/notebooks/emg.txt
!head emg.txt
In [32]:
data = np.loadtxt('emg.txt')
T = data[:,-1:]
X = data[:,:-1]
X.shape,T.shape
Out[32]:
In [33]:
for i in range(4):
plt.subplot(2,2,i+1)
plt.plot(X[:,i],T,'.')
In [34]:
plt.plot(X[:100,:]);
In [35]:
for i in range(4):
plt.subplot(2,2,i+1)
plt.plot(np.abs(X[:,i]),np.abs(T),'.')
Now, let's try fitting a linear model to this data.
In [36]:
Xf = np.hstack((X,X**2))
model = train(Xf,T)
predict = use(model, Xf)
print(rmse(predict,T))
In [37]:
X.shape
Out[37]:
In [38]:
result = []
for deg in [1,2,3]:
Xf = X.copy()
for d in range(1,deg):
Xf = np.hstack((Xf, X**d))
err = rmse( use( train(Xf, T), Xf), T)
result.append([deg,err])
result = np.array(result)
result
Out[38]:
In [39]:
Xf.shape
Out[39]:
In [40]:
plt.plot(result[:,0],result[:,1],'o-');
In [ ]: